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WO 00/23606 PCT/US99724646 
LONG TERMINAL REPEAT, ENHANCER, AND INSULATOR 
SEQUENCES FOR USE IN RECOMBINANT VECTORS 
BACKGROUND OF THE INVENTION 

The human endogenous retroviruses (HERVs) were inserted into the 
5 germ cells of primates millions of years ago and have remained as an integral 
part of the primate genomes during evolution. In addition to the proviruses, 
solo LTRs are also dispersed throughout the human genome (Wilkinson et al, 
1994; Lower et al, 1996). The solo LTRs contain the U3, R and U5 regions 
(Temin, 1 982) but no internal gag, pol and env genes. Together, the HERVs 

10 and the solo LTRs comprise approximately 5% of the human genome and 
belong to the category of middle repetitive DNAs characterized as 
retrotransposons (A.F. Smit, 1996; Henikoff et al, 1997). 

The ERV-9 proviruses, containing 30-50 members, constitute one of 
many families of the HERVs (Wilkinson et al, 1 993 ; Lower et al, 1 996). In 

1 5 addition to the proviruses, solo ERV-9 LTRs with a copy number of 3000- 
4000 have been found in the human genome (Henthorn et al, 1986; La 
Mantia, 1991; Schlessiger, 1992). The ERV-9 retrotransposons were 
inserted into the primate genome probably as early as ten million years ago 
(Di Cristofano et al, 1995). The retrotransposons have been suggested to be 

20 selfish DNAs irrelevant to the cellular functions of the hosts (Dolittle and 
Sapienza, 1 980). However, recent findings indicate that the enhancer and 
promoter elements in the U3 region of the LTRs (Lenz et al, 1984; Speck et 
al, 1990) initiate and promote the transcription of host genes located 
immediately downstream of the LTRs and may thus serve relevant cellular 

25 functions (Stravenhagen and Robins, 1988; Feuchter et al. 1 992; Goodchild 
et al, 1992; Ting et al, 1992; Schulte et al, 1996). 

The human -like globin genes consist of the embryonic the fetal 
G and A , and the adult and genes located on Chromosome 1 1 in a 
transcriptional order of 5' -G -A - - 3' (Efstratiadis et al. 1980). The 

30 transcription of these genes is regulated by the far upstream Locus Control 
Region (LCR), which is defined by four erythroid specific. DNase I 
hypersensitive sites HS 1 , 2, 3 and 4 (Tuan et al, 1985; Forrester et a!, 1 987; 
Grosveld et al, 1987; Dhar et al, 1990). The LCR between HS1 and HS4 is 

1 
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present in other mammals from mouse to galago and comprises the major 
functional component of the LCR (reviewed by Hardison et al, 1997). A 
ubiquitous HS5 site has been identified further upstream of the HS 1-4 sites 

0 

(Tuan et al, 1985; Dhar et al. 1990) in the apparent 5* boundary area of the 
5 LCR. 

Enhancer elements are cis-acting and increase the level of 
transcription of an adjacent gene from its promoter in a fashion that is 
relatively independent of the position and orientation of the enhancer 
element. In fact, Khoury and Gruss, 1983, Cell 33:313. state that "the 
1 0 remarkable ability of enhancer sequences to function upstream from, within, 
or downstream from eukaryotic genes distinguishes them from classical 
promoter elements ..." and suggest that certain experimental results 
indicate that "enhancers can act over considerable distances (perhaps >10 
kb)." 

1 5 Enhancer elements have been identified in a number of viruses, 

including polyoma virus, papilloma virus, adenovirus, retrovirus, hepatitis 
vims, cytomegalovirus, herpes virus, papovaviruses, such as simian virus 40 
(SV40) and BK, and in many non-viral genes, such as within mouse 
immunoglobulin gene introns. Enhancer elements may also be present in a 

20 wide variety of other organisms. Host cells often react differently to different 
enhancer elements. This cellular specificity indicates that host gene products 
interact with the enhancer element during gene expression. 

Although gene replacement by homologous recombination could be 
used instead of integrating vectors, this approach is not yet technically 

25 practical because of the very low success rate of the homologous 

recombination events and the inability to culture the pluripotent stem cells 
required for this approach. 

BRIEF SUMMARY OF THE INVENTION 
Disclosed are an enhancer, insulator, and promoter from the HS5 

30 region in the 5' boundary area of the locus control region of human -like 
globin genes. These transcription control sequences can be used to control 
expression of any desired gene of interest and can be used in any vector for 
this purpose. The control sequences are derived from the area in and around 

2 
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the U3 region of a solitary endogenous retrovirus (ERV") 9 long terminal 
repeat (LTR). 

Also disclosed are methods of expressing any gene of interest. For 
this purpose, the control sequences can be operably linked to the gene of 
5 interest (and operably linked to each other). The disclosed enhancers. 

insulators, and promoters can also be used with any other control sequences. 
Preferably, the control sequences are used in vectors to obtain expression of 
a gene of interest in a cell, including cells in animals. 

BRIEF DESCRIPTION OF THE DRAWINGS 

10 Figure 1 is a diagram of the location and structure of the ERV LTR in 

the boundary area of the P-globin LCR. The top line shows the human 0- 
like globin gene locus. Solid Boxes are the embryonic e-. fetal y- and aduli 
5- and p-globin genes. The vertical arrows indicate locations of the DNase 1 
hypersensitive sites HS 1 . 2, 3. 4 and 5. The hatched box 5' of the HS5 site 

15 is a solo ERV-9 LTR. The hatched box 3 ' of the (3-globin gene is a second 
copy of the ERV-9 LTR located 30 kb 3' of the P-globin gene (Henthorn et 
al, 1986; Anagnou et al, 1995). The middle line is the enlarged 5' boundary 
area drawn to scale according to the 1 kb scale bar. Open, hatched and gray 
boxes are respective locations of the HS5 site, ERV-9 LTR and an arbitrary 

20 upstream region (Ups) which was used as a control sequence for the LTR in 
reporter gene assays and RT-PCR studies. The bottom line is the structure of 
the LTR. Short horizontal arrows are the 14 short tandem repeats in the U3 
region. Solid bar is the R region. Long horizontal arrows are the three 
longer repeats in the U5 region. 

25 Figures 2A and 2B is the sequence of the 5'HS5 LTR in the 5 'el .4 

phage DNA clone from K562 cells (SEQ ID NO: 1). The four bases GTAT 
with the heavy overline and underline located at the 5' and 3' ends of the 
LTR are the presumed integration site of the LTR in the human genomic 
DNA. The horizontal arrows in U3 are the 14 tandem repeats of 37-41 bases 

30 in the U3 region. Angled arrow is the presumed transcriptional initiation site 
in the LTR, marking the beginning of the R region. The long horizontal 
arrows in the U5 region are the three repeats of 70 bases in U5. Arrowheads 
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connected to dotted overlines are locations of the PCR primers used in DNA 
PCR and RT-PCR studies discussed in Example 1 . Directions of the 
arrowheads are the 5' to 3 ? direction of the primers. 

Figure 3 is a comparison of the sequences of the U3 repeats. The top 
5 line is the organization of the four subtype U3 repeats 1.2,3 and 4 in 5'HS5 
LTR. P is the promoter in the U3 region. In the middle are the sequences of 
the subtype repeats 1, 2, 3. and 4 (SEQ ID NOs:8, 9, 10. and 1 1, 
respectively). Underlined bases are the GATA, CCAAT, CACCC or 
CCACC motifs. At the bottom are consensus sequences of the U3 repeats in 

10 different ERV-9 LTRs. 5'HS5 (SEQ IDNO:12), 3'p (SEQ IDNO:13) and 
LTR2 (SEQ ID NO: 14) are the 5'HS5 LTR, the LTR at 25 kb 3' of the p- 
globin gene (Henthorn et al. 1986; Anagnou et al, 1995). and the LTR in a 
random human DNA clone (Lania et al, 1992), respectively. Lower case 
letters separated by slashes indicate polymorphic bases in the U3 repeats. 

1 5 Figure 4 is a sequence comparison of three U3 promoters and the e- 

globin promoter. At the top is the U3 promoter of the 5'HS5 LTR 
(nucleotides 1 194 to 1287 of SEQ ID NO:l). The overlined bases are the 
equivalent of the TATA box (Strazzullo et al, 1994). Underlined bases are 
the DNA motifs found also in the U3 repeats. Angled arrow is the 

20 transcriptional initiation site in LTR2 (La Mantia et al. 1 992; Strazzullo et al, 
1994) and the presumed transcriptional initiation site in the 5'HS5 LTR. At 
the bottom is the sequence alignment of the four promoters in the 5'HS5 LTR 
(nucleotides 1 194 to 1287 of SEQ IDNO:l), 3'P LTR (SEQ ID NO:2) and 
LTR2 (SEQ ID NO:3), respectively. Dashes are DNA base deletions. 

25 Figures 5A-5D is a sequence alignment of the normal human (Hu N: 

nucleotides 624 to 1781 of SEQ IDNO:l), truncated human (Hu S: SEQ ID 
NO:6) and gorilla (Gori; SEQ ID NO:7) LTRs. Majority bases represents 
the consensus DNA sequence among the three LTRs (SEQ ID NO:5). 
Numbers between two horizontal lines are the DNA base ruler with base 1 

30 being the first base of the first U3 repeat in the LTRs. Vertical arrows are 
the positions of the first base in the U3 repeats. Dots represent the same 
bases in the human or gorilla DNAs as those in the consensus sequence. 
Dashes represent base deletions. The GTAT bases at positions 1081-84 
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marked with heavy overline are the integration site of the 5'HS5 LTR in both 
human and gorilla DNAs. 

Figure 6 is a diagram comparing the structures of the 5'HS5 LTR in 
the genomes of human and gorilla and in people of different racial lineages. 
5 Hu N is the human LTR of the normal length with 14 U3 repeats. Hu S is 
the human LTR of a shorter length with 1 1 U3 repeats. Gori is the gorilla 
LTR with 5 U3 repeats. Numbers in parentheses are the total number of 
bases in the LTRs including 140 bases of genomic DNAs downstream of the 
LTR insertion site — the GTAT bases, that were amplified by the PCR 
1 0 primers. Bent lines in Hu S and Gori are deletions of three and nine 

complete U3 repeats in the truncated human and gorilla LTRs respectively. 

Figure 7 is a diagram of the structure of recombinant CAT constructs. 
LTR is a 1 kb LTR sequence. Ups is 1.2 kb of DNA upstream of the LTR 
(see Figure 1). ep is a 200 bp s-globin promoter. HS2 is a 0.74 kb HS2 
1 5 enhancer. HS5 is a 1 .2 kb sequence spanning the HS5 site. 

Figure 8 is a graph of enhancer and promoter activities (in percent of 
substrate converted) of the 5'HS5 LTR in recombinant CAT constructs Ups- 
CAT, HS2-ep-CAT and LTR-CAT plasmids transiently transfected into 
K562, MEL and HL60 cells. Percent Conv is percentage conversion of the 
20 '^-chloramphenicol substrate by the CAT enzyme produced by the 

transfected test plasmid after normalization with respect to a common level 
of a co-transfected CMV-p-gal plasmid. 

Figure 9 is a graph of enhancer and promoter activities (in percent of 
substrate converted) of the 5'HS5 LTR in recombinant CAT plasmids ep- 
25 CAT. HS2-ep-CAT, LTR-CAT, LTR-ep-CAT, HS5-ep-CAT and LTR-HS5- 
ep-CAT integrated into the genome of K562 cells. Percent Conv is the 
percentage conversion of the l4 C -chloramphenicol substrate by the CAT 
enzyme produced by the integrated plasmids after normalization with respect 
to the per cell copy numbers of the plasmids. 
30 Figure 1 0 is a diagram of the 5'HS5 LTR in normal human DNA with 

14 U3 enhancer repeats. The four horizontal lines 1, 2. 3 and 4 represent the 
anticipated RT-PCR fragments amplified respectively by Primer pairs 1-4, 
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synthesized according to the K562 sequence in Figure 2. Numbers below the 
lines are the anticipated sizes in base pairs of the amplified cDNA fragments. 

Figure 1 1 is a diagram of examples of constructs using the disclosed 
enhancers and promoters. 
5 Figure 12 is a diagram of examples of constructs using the disclosed 

enhancers and promoters. 

Figure 13 is a diagram of examples of constructs using the disclosed 
insulators. 

DETAILED DESCRIPTION OF THE INVENTION 

10 Transcription of the human fi-like globin genes in erythroid cells is 

regulated by the far-upstream locus control region (LCR). Five kilobases of 
new upstream DNA were cloned and sequenced in order to define the 5' 
border of the LCR. An LTR-retrotransposon belonging to the ERV-9 family . 
of human endogenous retroviruses was found in the apparent 5' boundary 

15 area of the LCR. This ERV-9 LTR contains an unusual U3 enhancer region 
comprised of fourteen tandem repeats with recurrent GATA, CACCC and 
CCAAT motifs. This LTR is conserved in human and gorilla, indicating its 
evolutionary stability in the genomes of primates. In both recombinant 
constructs and the endogenous human genome, the LTR enhancer and 

20 promoter activate the transcription of cis-linked DNA preferentially in 
erythroid cells. 

Sequencing data of the 5' border region of the LCR reveal a solitary 
ERV-9 LTR with the characteristics of a retrotransposon in a location near 
the HS5 site (see Figure 1). This 5' HS5 LTR possesses an unusual 

25 sequence feature in the U3 enhancer region which is comprised of fourteen 
tandem repeats of a consensus DNA of 41 bases. These U3 repeats as well 
as the downstream promoter contain recurrent GATA. CACCC and CCAAT 
motifs. This LTR-retrotransposon is conserved with 98-99% sequence 
identities in people of different races and in the gorilla, except that some 

30 people have eleven instead of fourteen U3 repeats and the gorilla has only 
five U3 repeats. Functional tests with the CAT reporter gene assays 
demonstrate that the human 5' HS5 LTR activates the cis-linked CAT gene 
and possesses enhancer and promoter activities in erythroid cells. In the 
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CAT reporter gene assays, the LTR also synergized with and activated the 
cis-linked HS5 site. Consistent with these results, RT-PCR studies of 
cellular RNAs isolated from human primary cells and cell lines indicate that 
the endogenous LTR activates transcription of the downstream R, U5 and the 
5 genomic DNA at a higher level in erythroid than in nonerythroid cells. 

Disclosed are enhancers, insulators, and promoters derived from the 
HS5 region in the 5' boundary area of the locus control region of -like 
globin genes. These transcription control sequences can be used to control 
expression of any desired gene of interest and can be used in any vector for 
10 this purpose. The control sequences are derived from the area in and around 
the U3 region of a solitary endogenous retrovirus long terminal repeat (ERV- 
9 LTR). 

Also disclosed are methods of expressing any gene of interest. For 
this purpose, the control sequences can be operably linked to the gene of 

1 5 interest (and operably linked to each other). The disclosed enhancers, 

insulators, and promoters can also be used with any other control sequences. 
Preferably, the control sequences are used in vectors to obtain expression of 
a gene of interest in a cell, including cells in animals. 

Current strategies for gene expression in mammals and mammalian 

20 cells, especially gene therapy of hereditary or acquired blood diseases, 

employ retrovirus-mediated gene-transfer techniques. One of the common 
problems of this approach has been the extinction of the expression of the 
transgenes by the long terminal repeats (LTRs) of the vector flanking the 
therapeutic transgene and by the host sequences flanking the LTR-transgenic 

25 cassette. The disclosed enhancers-derived from the powerful enhancer 

discovered in the solitary LTR of the ERV-9 human endogenous retrovirus 
located in the 5' border of the B-globin Locus Control Region-can alleviate 
this problem. The ERV-9 LTR-enhancer is most active in erythroid cells and 
can thus be used to replace the LTR in the retroviral vector to avoid the 

30 transcriptional silencing of the transgene and to boost the transcription of the 
therapeutic transgene in erythroid progenitor cells. Another problem with 
gene expression in animal and mammalian cells, interference from flanking 
transcription, can be alleviated using the disclosed insulator. The disclosed 
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insulators are derived from a stretch of LTR DNA of 600 bases, which 
contains a very high G and C bases of 70% and is located immediately 
upstream of the ERV-9 LTR enhancer. The disclosed insulators can be used 
to insulate expression cassettes, especially those to be inserted in the genome 

5 of the host cell, from the transcriptional interference and silencing of the 
flanking host sequences. 

The solitary ERV-9 LTR sequence in the 6-globin Locus Control 
Region belongs to middle repetitive sequences in the human genome with a 
haploid copy number of 3000-4000. The first copy of a solitary ERV-9 LTR 

10 was reported in 1989. The functional significance of the ERV-9 LTRs 

dispersed in the human genome may be to transcriptionally activate and thus 
mark the cis-linked loci of hematopoietic genes and gene families in early 
progenitor cells during ontogeny and hematopoietic lineage differentiation 
and the specific function of the solo ERV-9 LTR located near the HS5 site in 

15 the 5' border of the human B-globin locus control region (LCR) may initiate 
transcription of the LCR during early stages of ontogeny and thi£ 
transcription process of the LCR regulates the transcriptional activation of 
the further downstream 13-like globin genes during erythropoiesis. 

Specifically disclosed are nucleic acid molecules comprising all or a 

20 functional portion of the U3 enhancer (nucleotides 595 to 1 1 93 of Figure 2; 
nucleotides 595 to 1 193 of SEQ ID NO:l), or modified forms of the U3 
enhancer, where a functional portion is a portion of the U3 enhancer that 
retains enhancer function. Also disclosed are nucleic acid molecule 
comprising all or a functional portion of the U3 insulator (nucleotides 5 to 

25 594 of Figure 2; nucleotides 5 to 594 of SEQ ID NO:l). or modified forms of 
the U3 insulator, where a functional portion of the U3 insulator is a portion 
of the U3 insulator that retains insulator function. Also disclosed are nucleic 
acid molecules comprising (1) all or a functional portion of the U3 enhancer 
(nucleotides 595 to 1 193 of Figure 2; nucleotides 595 to 1 193 of SEQ ID 

30 NOT), or modified forms of the U3 enhancer, operably linked to (?) all or a 
functional portion of the U3 insulator (nucleotides 5 to 594 of Figure 2; 
nucleotides 5 to 594 of SEQ ID NO:l), or modified forms of the U3 
insulator, where a functional portion is a portion of the U3 enhancer that 



WO 00/23606 PCT7US99/24646 

retains enhancer function and where a functional portion of the U3 insulator 
is a portion of the ID insulator that retains insulator function. 

Also disclosed are nucleic acid molecules comprising all or a 
functional portion of the U3 promoter (nucleotides 1 194 to 1322 of Figure 2; 
5 nucleotides 11 94 to 1322 of SEQ ID NO: 1). or modified forms of the U3 
promoter, where a functional portion of the U3 promoter is a portion of the 
U3 promoter that retains promoter function. Also disclosed are nucleic acid 
molecules comprising (1) all or a functional portion of the U3 enhancer 
(nucleotides 595 to 1 193 of Figure 2; nucleotides 595 to 1 193 of SEQ ID 

10 NO:l), or modified forms of the U3 enhancer, operably linked to (2) all or a 
functional portion of the U3 promoter (nucleotides 1 194 to 1322 of Figure 2; 
nucleotides 1194 to 1322 of SEQ IDNO:l), or modified forms of the U3 
promoter, where a functional portion is a portion of the U3 enhancer that 
retains enhancer function and where a functional portion of the U3 promoter 

15 is a portion of the U3 promoter that retains promoter function. 

Also disclosed are nucleic acid molecules comprising the U3 R 
region (nucleotides 1322 to 1380 of Figure 2; nucleotides 1322 to 1380 of 
SEQ ID NO:l), or modified forms of the U3 R region. Also disclosed are 
nucleic acid molecules comprising (1) all or a functional portion of the U3 

20 enhancer (nucleotides 595 to 1 193 of Figure 2; nucleotides 595 to 1 193 of 

SEQ ID NO:l), or modified forms of the U3 enhancer, operably linked to (2) 
the U3 R region (nucleotides 1322 to 1380 of Figure 2: nucleotides 1322 to 
1380 of SEQ ID NO:l), or modified forms of the U3 R region, where a 
functional portion is a portion of the U3 enhancer that retains enhancer 

25 function. 

Also disclosed are nucleic acid molecules comprising (1) all or a 
functional portion of the U3 enhancer (nucleotides 595 to 1 193 of Figure 2; 
nucleotides 595 to 1193 of SEQ IDNO:l), or modified forms oftheU3 
enhancer; operably linked to (2) all or a functional portion of the U3 
30 insulator (nucleotides 5 to 594 of Figure 2; nucleotides 5 to 594 of SEQ ID 
NO: 1 ), or modified forms of the U3 insulator; and operably linked to (3) all 
or a functional portion of the U3 enhancer (nucleotides 595 to 1 193 of Figure 
2; nucleotides 595 to 1 193 of SEQ ID NO:l), or modified forms of the U3 
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enhancer; where a functional portion is a portion of the U3 enhancer that 
retains enhancer function, where a functional portion of the U3 insulator is a 
portion of the U3 insulator that retains insulator function, and where a 
functional portion of the U3 promoter is a portion of the U3 promoter that 
5 retains promoter function. 
Enhancers 

The disclosed enhancers have enhancer function. Enhancers function 
to increase the transcription from promoters in proximity to the enhancer. 
The disclosed enhancers, like many enhancers, can function both upstream 

10 and downstream from a gene, and in either orientation. The disclosed 
enhancers are, or are derived from, all or a functional portion of the U3 
enhancer (nucleotides 595 to 1 193 of Figure 2; nucleotides 595 to 1 193 of 
SEQ ID NOT), or modified forms of the U3 enhancer, where a functional 
portion is a portion of the U3 enhancer that retains enhancer function. The 

1 5 disclosed enhancers can be combined with other transcription control 
elements, including the disclosed insulators and promoters. , 

Disclosed are primate 5' HS5 ERV-9 LTR enhancers. In particular, 
human and gorilla 5' HS5 ERV-9 LTR enhancers are disclosed. A preferred 
form of enhancer is the U3 enhancer present on nucleotides 595 to 1 193 of 

20 Figure 2 (nucleotides 595 to 1 1 93 of SEQ ID NOT). The U3 enhancer is 

made up of fourteen repeat units, where each repeat has one of the following 
four sequences: 

TATCTAGCTCAGGGATTGTAAATACACCAATCGGCAGTCTG (SEQ 
IDNO:8), 

25 TGTCTAGCTCAAGGTTTGTAAACACACCAATC AGCACCC1 G (SEQ 
ID NO:9), 

TATCTAGCTCAGGGTTTGTGAATGCACCAATCAACACTCTG (SEQ 
IDNOT0), 

TGTCTAGCTACTCTGTGGGGACGTGGAGAACCTTTA (SEQ ID 
30 NO:ll). 

Also disclosed are modified forms of the U3 enhancer where the 
modified enhancer retains enhancer function.. These include: 
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Enhancers having three or more repeats, where each repeat has one of 
the following sequences: 

TRTCTAGCTCADGGTTTGTRAAYRCACCAATCAGCACTCTG (SEQ 
ID NO: 12), 

5 TATCTAGCTCAGGGATTGTAAATACACCAATCGGCAGTCTG (SEQ 
ID NO:8), 

TGTCTAGCTCAAGGTTTGTAAACACACCAATCAGCACCCTG (SEQ 
ID NO:9), 

TATCTAGCTCAGGGTTTGTGAATGCACCAATCAACACTCTG (SEQ 
10 ID NO: 10), 

TGTCTAGCTACTCTGTGGGGACGTGGAGAACCTTTA (SEQ ID 
NO:ll). 

Enhancers having three or more repeats, where each repeat has one of 
the following sequences: 
15 TATCTAGCTCAGGGATTGTAAATACACCAATCGGCAGTCTG (SEQ 
IDNO:8), , 
TGTCTAGCTCAAGGTTTGTAAACACACCAATCAGCACCCTG (SEQ 
ID NO:9), 

TATCTAGCTCAGGGTTTGTGAATGCACCAATCA4.CACTCTG (SEQ 
20 ID NO: 10), 

TGTCTAGCTACTCTGTGGGGACGTGGAGAACCTTTA (SEQ ID 
NO:ll). 

Enhancers having three or more repeats, where each repeat has the 
following sequence: 

25 TRTCTAGCTCADGGTTTGTRAAYRCACCAATCAGCACTCTG (SEQ 
ID NO: 12). 

Enhancers where the enhancer has from three to fourteen repeat units. 

Enhancers where one or more of the repeat units of the enhancer are 
deleted, one or more of the repeat units are replaced with a repeat unit of the 
30 enhancer having a different sequence than the repeat unit that is replaced, 
one or more repeat units of the enhancer are added to the enhancer, or a 
combination of one or more of these modifications. 
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The disclosed control sequences can be used, alone or in 
combination, to express any gene of interest. For this purpose, the control 
sequences can be operably linked to the gene of interest. Preferably, the 
gene encodes a protein. Preferably, the control sequences are used in vectors 
5 to obtain expression of a gene of interest in a cell, including cells in animals. 
Preferred vectors include retroviral vectors, adenoviral vectors, and other 
vectors suitable for gene expression in mammalian cells and/or suitable lor 
gene therapy. Many vectors are known and the disclosed control sequences 
can be used in any of these vectors. 

1 0 Also disclosed are cells transformed with vectors containing one or 

more of the disclosed control sequences. That is vectors containing one or 
more of the disclosed enhancers, insulators, or promoters. Preferred cells are 
eukaryotic cells, animal cells, and mammalian cells. Also disclosed is a 
method of expressing a protein, the method comprising culturing cells 

1 5 transformed with a vector containing one or more of the disclosed control 
sequences operably linked to the gene. Also disclosed is a method of 
expressing a gene in an animal, the method comprising introducing into the 
animal cells transformed with a vector containing one or more of the 
disclosed control sequences operably linked to the gene. Also disclosed is a 

20 method of expressing a gene in an animal, the method comprising 

introducing into cells of an animal a vector containing one or more of the 
disclosed control sequences operably linked to the gene. 
Insulators 

Insulators are nucleic acid segments that reduce or eliminate 
25 transcription from adjacent regions from affecting the nucleic acid segment 
to which the insulator is associated. The disclosed insulators preferably are 
placed upstream of other control sequences and/or downstream of genes. 
Insulators are preferably placed between different genes, transcription units, 
or genetic domains to reduce or prevent interference of the adjacent 
30 expression sequences. The disclosed insulators are, or are derived from, all 
or a functional portion of the U3 insulator (nucleotides 5 to 594 of Figure 2; 
nucleotides 5 to 594 of SEQ ID NOT), or modified forms of the U3 
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insulator, where a functional portion of the U3 insulator is a portion of the 
U3 insulator that retains insulator function. 
Promoters 

Promoters are nucleic acid segments that mediate initiation of 
5 transcription. The disclosed promoters are, or are derived from, all or a 

functional portion of the U3 promoter (nucleotides 1 194 to 1322 of Figure 2; 
nucleotides 1 194 to 1322 of SEQ ID NO:l), or modified forms of the U3 
promoter, where a functional portion of the U3 promoter is a portion of the 
U3 promoter that retains promoter function. 
1 0 Use Of Control Elements 

The disclosed enhancers, insulators, and promoters can be used in a 
variety of vectors and expression constructs to regulate and promote 
transcription of genetic elements placed in the same constructs. The 
disclosed control elements are preferably used in retroviral vectors to obtain 
1 5 expression in mammalian cells, and especially to express genes in cells in, or 
to be introduced into, animals (including humans) for gene therapy. 
Specific examples of such uses are: 

1 . The 5'HS5 ERV-9 LTR and/or its component U3 enhancer, insulator, 
and promoter, the R and the U5 regions can be used to replace the LTRs or 

20 their equivalent U3, R and U5 regions of retroviral vectors designed for gene 
therapy of hereditary or acquired hematological diseases including sickle cell 
disease, thalassemias, leukemias and AIDS. 

2. The U3 enhancer, insulator, and promoter, and the R region can be 
used to activate (and/or insulate) in hematopoietic cells the transcription of a 

25 cis-linked transgene in either viral or non-viral vectors. The host cells for the 
transgene can be the hematopoietic stem cells, progenitor cells or mature 
lineage differentiated cells such as the erythroid, myeloid or lymphoid cells. 

3 . Base mutations, and/or rearrangements and substitution of repeat 
units, can be introduced into the U3 and R regions to enable the U3 enhancer 

30 and promoter and the R region to work more efficiently in a specific 

hematopoietic lineage such as the erythroid, myeloid or lymphoid lineage. 
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Design of the retroviral vectors and transgenic cassettes. 

1 . The disclosed enhancers, promoters, R region, and U5 region can be 
used to replace the LTRs or their component U3, R and U5 regions of 
retroviral vectors designed for gene therapy of hereditary or acquired 
5 hematological diseases. The disclosed insulators can also be added to the 
vector. The replacement can be in either the 5' or the 3" LTR or both the 5' 
and 3' LTRs of an appropriate retroviral vector. Example constructs are 
shown in Figure 1 1 . 

U3: the U3 enhancer and promoter of the 5'HS5 ERV-9 LTR 
1 0 R: the R region of the 5'HS5 ERV-9 LTR 

U5: the U5 region of the 5 "HS5 ERV-9 LTR 

U3E: the U3 enhancer of the 5'HS5 ERV-9 LTR 

U3p. R and U5: the U3 promoter, R and U5 regions of appropriate 

non-5'HS5 ERV-9 LTRs. 
15 2. Constructs such as those shown in Figure 12 can be used to activate 

the transcription of cis-linked transgene spliced in either viral or, non-viral 

vectors in hematopoietic cells. 

U3: the U3 enhancer and promoter of the 5'HS5 ERY-9 LTR 
R: the R region of the 5 ! HS5 ERV-9 LTR 
20 U5: the U5 region of the 5 ? HS5 ERV-9 LTR 

U3E: the U3 enhancer of the 5'HS5 ERV-9 LTR 
U3P: the U3 promoter of the 5'HS5 ERV-9 LTR 

R and U5: the R and U5 regions of appropriate non-5 'HS5 ERV-9 LTRs. 
P: appropriate promoter other than the U3 promoter of the 5'HS5 

25 ERV-9 LTR. 

3. The disclosed insulators can be used to insulate integrated transgenes 
in hematopoietic and non-hematopoietic cells from transcriptional 
interference exerted by the host genome and or elimination by the host 
genome over time, so that the transgene can be efficiently transcribed from 

30 its own enhancer and promoter and also can be stably integrated in the host 
genome over time. Examples of constructs using the disclosed insulators are 
shown in Figure 13. Such constructs will have improved expression 
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consistency and stability by limiting or eliminating the influence of flanking 
transcription activities. 

The U3 enhancer repeats of the 5'HS5 LTR can also be used to 
identify transcription factors that bind to the enhancer. The transcription 
5 factors bound by the DNA motifs in U3 repeats can be identified by 

electrophoretic mobility shift assays (EMSA) with nuclear extracts isolated 
from cells, such as K562 and placenta trophoblasts, and supershift assays 
with antibodies against various known transcription factors. Such techniques 
for use with other protein binding sites are well established and can be used 
1 0 with the disclosed enhancers. 

The genes encoding new transcription factors identified through this 
process can then be cloned. The molecular architecture and activity of the 
U3 enhancer complex can also be examined by site-directed mutagenesis of 
the U3 repeats in test plasmids containing the Green Fluorescent Protein 
1 5 (GFP) reporter gene, following transfection into cells, such as K562, CFU-E 
and placental trophoblast cells. . 
Constructs and Vectors 

The disclosed control elements (that is, the disclosed enhancers, 

insulators, and promoters) are useful for expression of any desired gene. For 
20 this purpose, the disclosed control elements can be included in constructs and 

vectors designed for expression of genes of interest. Many such vectors are 

known. Preferred vectors are those for use in animals cells, and in particular, 

those for use in mammalian cells. 

Examples of vectors and delivery techniques that can be adapted for 
25 use with the disclosed control elements are described in U.S. Patent No. 

5,968,735, U.S. Patent No. 5,965,440, U.S. Patent No. 5.965,358, U.S. 

Patent No. 5,932,210, U.S. Patent No. 5,925,565, U.S. Patent No. 5.888,820, 

U.S. Patent No. 5,888,767, U.S. Patent No. 5,886,166. U.S. Patent No. 

5.871,997, U.S. Patent No. 5.866,696, U.S. Patent No. 5.866,41 1, U.S. 
30 Patent No. 5,858,744, U.S. Patent No. 5,856,152, U.S. Patent No. 5,837,503, 

U.S. Patent No. 5,830,727. U.S. Patent No. 5,817,492. U.S. Patent No. 

5.814,482, U.S. Patent No. 5,81 1,260, U.S. Patent No. 5.795,577, U.S. 

Patent No. 5,789,244, U.S. Patent No. 5,783,442, U.S. Patent No. 5,770,400, 
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U.S. Patent No. 5,759,852, U.S. Patent No. 5,756,264, U.S. Patent No. 
5,753,499, U.S. Patent No. 5.744,133, and U.S. Patent No. 5,710,037. 
Gene Therapy 

The disclosed control elements can be used in vectors and constructs 
5 for gene therapy. "Gene therapy" refers to the treatment of pathologic 
conditions by the addition of exogenous nucleic acids to appropriate cells 
within the organism. The disclosed contol elements can be used to express 
and increase the efficiency of expression of genes added in gene therapy. 
Nucleic acids must be added to the cell, transfected or transfected, such that 

1 0 they remain functional within the cell. The disclosed insulators can protect 
introduced genes from interfering endogenous transcription at the site of 
insertion. For most gene therapy strategies, the new nucleic acids are designed 
to function as new genes, i.e.. code for new RNA or messenger RNA, which in 
turn codes for new protein. Alternatively, therapeutic genes can produce 

1 5 antisense or ribozymes which can directly effect cellular or pathogen functions 
without having to express protein from mRNA. Gene therapy can be directed 
towards monogenetic disorders like adenosine deaminase deficiency and 
cystic fibrosis or to polygenetic somatic disorders like cancer. 

Human gene therapy has been successfully applied to correct genetic 

20 diseases in adenosine deaminase deficiency (severe combined 

immunodeficiency) (Approved Protocol) "Treatment of Severe Combined 
Immunodeficiency Disease (SCID) Due to Adenosine Deaminase (ADA) 
Deficiency with Autologous Lymphocytes Transduced with a Human ADA 
Gene" Hum. Gene Ther. 1:327-362 (1990); Anderson, W.F. "Human Gene 

25 Therapy" Science 256:808-8 1 3) and familial cholesterolaemia (Grossman, et 
al. Nature Genetics 6:335-341 (1994)). Many new gene therapy protocols are 
in progress or being planned (Morgan and Anderson Ann. Rev. Biochem. 
62:191-217 (1993)). Vectors, constructs, and protocols described in the 
studies above can be adapted for use with the disclosed control elements. 

30 The rapid implementation of gene therapy in human trials has been 

made possible by the development of relatively efficient means of transferring 
new nucleic acids into cells, a process generally referred to as "gene 
transduction". The clinically applicable gene transduction methods fall into 
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one of three categories: a) cationic lipids, (b) molecular conjugates and (c) 
recombinant viruses. These different means of accomplishing gene 
transfection have been recently reviewed by Morgan, Ann. Rev. Biochem. 
62:191 (1992); Mulligan Science 260:926 (1993); and Tolstoshev Ann. Rev. 
5 Pharm. Toxicol. 32:573 (1 993)). Any of these transfer systems can be used for 
constructs using the disclosed control elements. 

Most of the successful human gene therapy protocols utilize vectors 
derived from defective murine leukemia retroviruses (Anderson Science 
256:808-813 (1992); Miller Nature 357:455-460 (1992): Miller Curr. Top. 

1 0 Microbiol. Immunol. 158: 1-24 (1992), for review of these vectors and the 
packaging cell lines, Miller, Methods in Enzymology 217:581-599 (1993)). 
Although there is a limitation in the size of the gene (up to 7 to 8 kb) that can 
be transducted, the retrovirus based vectors have the advantage in that they can 
incorporate a permanent copy of the delivered gene into the chromosomes of 

1 5 the recipient cells and therefore potentially can represent a cure for a disorder 
arising due to the expression of an undesirable protein, activation of an 
oncogene, or insufficient expression or expression of a defective protein. Due 
to their retroviral origins, the disclosed control elements are particularly suited 
for use in retroviral vectors. 

20 The majority of the gene transfer procedures used to date for human 

gene therapy is known as an ex vivo gene transfer. The recipient cells are 
removed from the patient and grown in a cell culture laboratory. Replication- 
incompetent, virus-like particles containing the therapeutic gene, which are 
produced from packaging cells, are used to transduce the recipient cells. The 

25 transduced recipient cells are then selected by growing in selection media, 
expanded and returned to the patient. The packaging cells are genetically 
engineered cell lines that, once a therapeutic gene is transferred into the cells, 
produce virus-like particles containing the therapeutic gene to be delivered 
into other cells. 

30 Other gene transferring vehicles in which the disclosed control 

elements can be used are those based on human immunodeficiency virus 
(HIV) (Poznansky, et al. J. Virol. 65:532-536 (1991); Buchschacher, et al. J. 
Virol. 66:2731-2739 (1992); Shimada, et al. J. Clin. Invest. 88:1043-1047 
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(1991) ) and adeno-associated virus (Chatterjee, et al. Science 258:1485-1488 

(1992) ; Muzyczka Curr. Top. Microbiol. Immunol. 158:97-129 (1992)). 

An HIV based delivery system is believed to be particularly suitable 
for gene therapy against AIDS. Not only can the genes transferred by HIV 
5 virus-based vectors be integrated into the genome of non-dividing cells 

(Weinberg, et al. J. Exp. Med. 174:1477-1482 (1991); Bukrinsky, et al. Proc. 
Natl. Acad. Sci. U.S.A. 89:6580-6584 (1992); Lewis, et al. [published erratum 
appears in EMBO J. Nov: 1 1(1 1):4249(1 992)] EMBO. J. 11:3053-3058 
(1992)), the presence of HIV gpl20 on the surface of the gene delivering 
1 0 particles renders them specific for gene delivery to CD4- cells. 

The U3 enhancer region in 5' HS5 LTR contains an unusual sequence 
of fourteen tandem repeats of 37-41 bases. The tandem repeats are 
comprised of four subtypes 1.2,3 and 4, which are arranged in the LTR in 
the order 1-2-3-4-1-2-3-4-1-2-3-4-4-1. The consensus sequence of the U3 
1 5 repeats (SEQ ID NO: 12) reveals five conserved motifs. GATA, TAGCTCA, 
GGTTTGT (or GGTGG/CCACC in subtype 4) and CCAAT. The motifs 
GATA, CCAAT and CACC can potentially bind to cognate transcription 
factors abundantly expressed in hematopoietic and erythroid cells. 

The consensus sequence of U3 repeats shows higher than 90% 
20 sequence homology with that of the U3 repeats of the 3 ' ERV-9 LTR 

located 25 kb 3' of the -globin gene and of LTR2, a random clone of ERV- 
9 LTR (Figure 3). 

The promoter sequence in the LTR is located in the U3 region at the 
3' end of the U3 repeats and is immediately upstream of the transcribed R 
25 region whose 5' border marks the transcriptional initiation site for retroviral 
RNA synthesis. The promoter of the 5'HS5 LTR show s a sequence 
homology of 80% with the promoter of the 3 ' LTR and of over 90% with 
the promoter of LTR2. The transcriptional initiation site of LTR2 has been 
determined by primer extension to be located 28 bases downstream of the 
30 AATAAAA box. Because of extensive sequence homologies between the 
5'HS5 LTR and the LTR2 promoters, especially the 100% sequence 
homology in the 70 DNA bases flanking the AATAAAA box, the 
transcripitional initiation site of the 5'HS5 LTR was placed at the identical T 
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base 28 bases downstream of the AATAAAA box. All three LTR promoters 
contain the GATA, CACCC and CCAAT motifs at identical locations, -36, - 
46 and -63 bases respectively, relative to the retroviral transcriptional 
initiation site. 

5 The 5'HS5 LTR promoter also bears structural similarities with the 

promoters of the further downstream -, - and -globin genes in that a 
combination of similar GATA, CACCC and CCAAT motifs is found also 
upstream of the AATAAAA boxes in the globin promoters. In particular, the 
5'HS5 LTR and the -globin promoter share additional sequence homologies 

1 0 in the region immediately 5 ? of the transcriptional initiation site. The above 
homologies indicate that, like the globin promoters, the 5'HS5 LTR enhancer 
and promoter ought to be active in erythroid cells. Indeed, transfection 
assays show that the 5'HS5 LTR exhibits enhancer and promoter activities 
and can promote the transcription of cis-linked DNA to relatively high levels 

1 5 in erythroid cells and in placenta. 

The consensus sequence of the modular U3 repeats in 5'HS5 LTR 
reveals that the modular U3 repeat contains five well conserved and recurrent 
DNA motifs organized invariably in the following 5'->3' order: GATA, 
TAGCTCA, GGTTTGT (or TGGTGGG in subtype 4) and 

20 CACCAATCAGCA (nucleotides 25 to 36 of SEQ ID NO: 12). This 
invariable sequence structure suggests a definitive organization of the 
cognate protein factors in the assembly of the U3 enhancer complex. 

The GATA motifs bind to the GATA family of transcriptio. /factors 
including GATA- 1 , -2 and -3 . Targeted disruptions of the GATA- 1 , -2 and - 

25 3 genes have been reported to cause severe abnormaties in hematopoiesis and 
erythropoiesis, indicating that these factors play important regulatory roles in 
erythroid cells. Different GATA factors are expressed at relatively higher 
levels in different hematopoietic cells. In CD34+ hematopoietic 
stem/progenitor cells, GATA-2 is expressed at a high level relative to 

30 GATA- 3 and GATA- 1 . In erythroid K562 cells, both GATA-1 and GATA-2 
are expressed. In CFU-E. GATA-1 is the major detected GATA factor. In 
placenta trophoblasts, GATA-2 and GATA-3 are expressed. 
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The CACCC motifs bind to erythroid transcription factors EKLF and 
BKLF. EKLF is expressed at very low levels in K562 cells expressing the 
embryonic globin program and at much higher levels in MEL cells 
expressing the adult globin program. Unlike EKLF, BKLF is expressed 
5 abundantly in embryonic yolk sac and fetal liver and is not confined to 
erythroid cells. However, the motif in the U3 repeats is CACC and not 
CACCC found in the strong EKLF and BKLF binding sites, and may thus 
bind to these factors weakly or bind to different factor(s). 

The CCAAT motifs may bind to two families of protein factors, the 
10 C/EBPs expressed in various hematopoietic cells and adipocytes and the 

ubiquitous NF-Y complex. The C/EBP transcription factors include C/EBP 
, , , . , and CHOP, a dominant negative inhibitor of the C/EBPs. They 
bind to the CCAAT motifs as a homodimer or heterodimer through the -ZIP 
domain. The CCAAT boxes have been reported to play pivotal roles in the 
15 activities of the globin promoters, suggesting the existence in erythroid cells 
of transcription factors that bind to and activate the CCAAT boxes. 
However, none of the C/EBP , , , and are present at detectable levels in 
erythroid K562 cells and C/EBP , a ubiquitous factor, appears to be 
expressed mainly in lymphoid cells. This suggests that in K562 cells the 
20 CCAAT box may be bound paradoxically by negative regulators CHOP and 
CDP or primarily by the ubiquitous NF-Y complex. 

The NF-Y complex, also named CP1, consists of three subunits A, B 
and C. All three subunits are required for binding to the CCAAT box as a 
trimeric complex through the histone fold motif, which bears similarity to the 
25 DNA binding domain of the histones. The NF-Y factors through the histone 
fold domain may also associate with histone acetyltransferase and thus be 
able to remodel and open up the chromatin structure of the CCAAT box and 
its neighboring DNA. In EMS A gels with nuclear extract from erythroid 
cells, after the NF-Y complex was supershifted with antibodies, the CCAAT 
30 box containing probe still formed shifted complexes. This suggests that 

erythroid cells may contain yet unidentified nuclear factors that may bind to 
the CCAAT motifs in U3 repeats. 
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The remaining two conserved sequence motifs TAGCTCA and 
GGTTTGT in the U3 repeats may also be bound by yet unidentified 
transcription factors present in erythroid cells. It is of interest to note that 
motifs similar to TAGCTCA are found also in enhancers and promoters of 
5 genes expressed in various hematopoietic lineages: TAGCCTGA in the 

MLV U3 enhancer, TAGCTAA in the promoter of M-CSF receptor gene and 
TAGCTTCA in the Invariant Chain promoter of the major histocompatibility 
complex. 

The enhancers of many genes including the HS2 enhancer of the - 
1 0 globin LCR usually span several hundred bases and are bound by many 

different protein factors, which make the analysis of the enhancer complex a 
complicated task. In contrast, the 14 modular U3 repeats in the 5'HS5 LTR 
contain up to four well conserved DNA motifs and may be bound by 
similarly limited number of recurrent protein factors, making it a simpler 
1 5 task to analyze the structure of this enhancer complex. 

Example 

This example describes the cloning and characterization of the 5' 
border region of the LCR upstream of human li-like globin genes. 
20 MATERIALS AND METHODS 

Isolation of 5' 1.4 phage clone and DNA sequencing: The 5' 1 .4 

phage cone spanning 12 kb of DNA 5' of the HS4 site was obtained from a 
K562 genomic DNA library constructed in EMBL phage (Weber-Benarous 
et al, 1988). The library was screened with a unique DNA probe 5' 1 .4 

25 located near the HS4 site in the LCR (Li et al, 1985). The genomic DNA 
insert contained 8 kb of DNA spanning the HS5 site whose sequence was 
subsequently reported (Yu et al, 1994) and 5 kb of further upstream new 
DNA. The 8 kb of DNA was cleaved by Hind III into four sub-fragments of 
2.7 kb spanning the HS5 site and 1 .5, 1 .6 and 2 kb spanning the new DNA. 

30 They were subcloned into a plasmid vector (Tuan et al. 1990) and sequenced 
with the dideoxy terminator method (Sanger et al, 1 977 ) using Sequenase or 
Taquenase Kit (USB Corp). This sequence strategy produced unambiguous 
DNA sequencing ladders for the entire 8 kb of DNA except for the 1 kb of 
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DNA in the junction area between the 1.5 and 1.6 kb subclones which 
contained the repetitive sequences of the ERV-9 LTR. The junction DNA 
was recloned into a phagemid vector Bluescript II SK(+/-) (Stratagene) and 
the single stranded DNA was sequenced as above. The sequences were 
5 assembled and analyzed using the GCG DNA analysis software. The 8 kb 
DNA sequence was submitted to GenBank (Banklt 193637 AF064190). 

Purification of genomic DNAs from the gorilla and people of 
different races: Genomic DNAs were isolated anonymously from human 
blood samples collected by the Hemoglobin Laboratory at the Medical 

1 0 College of Georgia for diagnosis of thalassemia and sickle cell disease. 

African samples were from patients homozygous for sickle cell disease or 
Hereditary Persistence of Fetal Hemoglobin (HPFH). Arabic and Asian 
samples were from people hemizygous for a-thalassemia and the Caucasian 
samples were from normal individuals or patients with p-thalassemia. The 

1 5 gorilla blood sample was obtained from the Yerkes Primate Center of Emory 
University. High molecular weight genomic DNAs were purified from 
nucleated blood cells (Poncz et al, 1 982) 

PCR-amplification of the 5'HS5 LTR in genomic DNAs and 
sequence analysis of the amplified LTR: The 5'HS5 LTRs were amplified 

20 from genomic DNAs with Primer pair 3 used also for RT-PCR (Figure 10; 
forward primer, positions 595-616 and reverse primer 1807-1831, Figure 2; 
nucleotides 595 to 616 of SEQ IDNO:l). PCR conditions consist of an 
initial denaturation at 95°C for 1.5 min, followed by 32 cycles of 
denaturation at 95°C for 1 .5 min, annealing at 59°C for 1 min and extension 

25 at 72°C for 2 min and a final extension step at 72°C for 1 5 min. The 

amplified LTR fragments w ere purified by Quantum Plasmid Miniprep Kit 
(Bio-Rad) and sequenced by the Molecular Biology Core Laboratory of the 
Medical College of Georgia using the cycle sequencing technique with 
flourescent dideoxy terminators. 

30 Construction of recombinant CAT plasmids: LTR-CAT 

(Construct 1 ): The 1 kb LTR was amplified from K562 genomic DNA by 

PCR with forward primer: 5" TACT GTCGAC CTGAGT- 

TTGCTGGGGATG 3' (positions 3250-3271 in the 8 kb GenBank sequence, 
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Banklt 193637 AF064190 corresponding to positions 595-616 in Figure 2; 
nucleotides 595 to 616 in SEQ ID NO:l) and reverse primer 5' 
GAT GGATCC TGTGTCCGGAATTGGTGG 3' (positions 4282-4299 in 
GenBank sequence; positions 1677-1694 in Figure 2; nucleotides 1677 to 
5 1 694 in SEQ ID NO: 1 ). A Sal I and a Bam HI cloning site (underlined) were 
added respectively to the forward and reverse primers. The PCR fragment 
was cleaved with Sal I and BAM HI enzymes and together with a Bam HI- 
Hind III adapter was spliced into a promoterless CAT vector derived from 
ep-CAT (Construct 3) in which the e-globin promoter (ep) was removed with 
10 Sal I and Hind III digestions. Ups-CAT (Construct 2) contains a 1 kb PCR 
fragment amplified from the genomic DNA located 2 kb further upstream of 
the LTR and was created with the same cloning strategy. The respective 
forward and reverse primers were 5' 

ACT GTCGAC TTATGTATTCAAGTTCG 3' (positions 50-66 in GenBank 

15 sequence; SEQ ID NO:21) and 5' 

GAT GGATCC AATAGATTTTTGTCATCT 3' (positions 12034220 in 
GenBank sequence; SEQ ID NO:22). ep-CAT (Construct 3) and HS2-ep- 
CAT (Construct 4) were previously made (Tuan et al, 1989). LTR-ep-CAT 
(Construct 5) was created with the above 1 kb LTR DNA obtained by PCR 

20 which was cleaved at the Sal I and Bam HI cloning sites and spliced into ep- 
CAT(Construct 3) which was also cleaved at the Sal 1 and BAM HI sites 
located 5' of the ep. HS5-ep-CAT (Construct 6) was created with the same 
cloning strategy as LTR-sp-CAT. (Construct 5). The 1.2 kb HS5 fragment 
was generated by PCR from forward primer 5' 

25 ACT GTCGAC AAGCTTCTGACAAATTATTCTT 3' (positions 543 1-5455, 
GenBank sequence; SEQ ID NO: 15) and reverse primer 5' 
GAT GGATCC ACTGAAAGGGCTCATGCAAC 3 '(positions 6657-6676), 
GenBank sequence; SEQ ID NO: 16). LTR-HS5-ep-CAT (Construct 7) was 
made from LTR-ep-CAT (Construct 5) which was linearized at the Bam HI 

30 site 3' of the LTR. The above 1.2 kb HS5 fragment obtained by PCR was 

cleaved at the 5' end with Hind III (a natural site) and at the 3 ' end with Bam 
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HI and together with a Bam HI-Hind III adapter was spliced into the Bam HI 
site in LTR-ep-CAT. 

Transient and stable transfections and CAT assays: Transfection 
host cells K562, HL60 and MEL cells were cultured and transfected as 
5 described (Tuan et al, 1989) with modifications. In transient transfections, 
1 0 ug of each of the above CAT plasmids were mixed with 5 of a 
reference CMV P-gal plasmid and transfected into the host cells by 
electroporation. CAT assays were carried out as described (Tuan et al, 1989) 
with two modified steps of normalizations. The CAT extracts were 

1 0 normalized first with respect to the total protein in the extract determined 

with the BCA (Bicinchoninic acid) protein kit (Pierce) and then with respect 
to the P-galactosidase level of the co-transfected CMV p-gal plasmid to 
ensure that the CAT assays of different samples were carried out on extracts 
containing similar levels of.p-gal activities, therefore, similar amounts of the 

15 transfected tested plasmids. The p-gal enzyme levels were determined with 
the P-gal Assay Kit (Promega). The CAT enzymatic activities were 
analyzed by thin layer chromatography and quantified with a 
Phosphorlmager (Molecular Dynamics). The results were presented as 
percentages of conversion calculated from the 14 C counts in the acetylated 

20 chloramphenicol divided by the total input I4 C counts of the chloramphenicol 
substrate. In stable transfection, pooled cell populations were studied. The 
CAT activities were normalized with respect to the copy numbers of the 
integrated plasmids determined by Southern blots. 

Isolation of total cellular RNAs and RT-PCR: Total cellular 

25 RNAs were purified from freshly harvested, non-transfected human erythroid 
K562, promyelocytic HL60, embryonic teratocarcinoma N-Tera (obtained 
from ATCC) and murine erythroleukemia MEL cell lines, adult human 
peripheral blood CFU-E and T-lymphocytes (Wickrema et al, 1992) and full 
term human placenta. The RNAs were purified with the Totally RNA Kit 

30 (Ambien). For a semi-quantitative comparison of the RT-PCR bands 

generated by different primer pairs, each RNA was first reversely transcribed 
into cDNA with random hexamers as primers into a cDNA master stock, 
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which was then aliquoted into separate tubes for PCR with different primer 
pairs as described (Kong et al. 1997). The 5'-->3' sequences of the 
respective forward and reverse primers are marked in Figure 2. Primer pair 
1 : CTGAGTTTGCTGGGGATGCGAA (positions 595-616; SEQ ID NO: 17) 
5 and GATTTAGTGACTCATATTGTTTCTGA (positions 1700-1726; SEQ 
ID NO: 18); Primer pair 2: TGCTGCTGCTCACTGTTTGGGTCTA 
(positions 1349-1373; SEQ ID NO:19) and the reverse primer was the same 
as that of Primer pair 1 . Primer pairs 3 and 4 contain the same forward 
primers as the respective forward primers of Primer pairs 1 and 2. Primer 
10 pairs 3 and 4 contain a common reverse primer: 

5'GGGCACTCTGCCTTAGGGAGTAACA 3' (positions 1807-1831; SEQ 
ID NO:20). The human p-actin primer pair was obtained from Stratagene. 
Before RT-PCR, the abilities of the primer pairs to produce amplification 
fragments were confirmed by PCR with genomic DNA templates. 
15 RESULTS 

An LTR-retrotransposon of the ERV-9 family of human 
endogenous retroviruses is located proximal to the HS5 site in the 5' 
boundary area of the LCR: In order to study the sequence and function of 
DNA in the boundary area of the LCR, a K562 DNA library was screened 
20 (Weber-Benarous et al, 1988) and obtained a clone containing 8 kb of DNA 
sequence that spans the HS5 site and 5 kb of new further upstream DNA. As 
the sequence features of the upstream DNA were previously unknown, the 
5 kb new DNA as well as the 3 kb DNA spanning the HS5 site was 
sequenced (GenBank accession number: Banklt 193637 AF064190). The 
25 DNA sequence of the 3 kb DNA spanning the HS5 site is in general 

agreement with the DNA sequence of this region reported earlier (Yu et al, 
1994), except for a number of polymorphic base differences. In the new 
DNA. sequence matches using the GCG and BLAST programs revealed the 
existence of a solitary LTR at a location within 2 kb 5' of the HS5 site (Long 
30 et al, 1995) (Figure 1). Comparison with a few selected homologous 

sequences in the GenBank data base, including the LTR sequence located 5' 
of the ZNF80 protein gene (Di Cristofano et al. 1995. GenBank Accession 
No. X83497), showed that the 5'HS5 LTR spans 1 .7 kb of DNA (Figure 2) 
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and belongs to the ERV-9 family of human endogenous retroviruses (La 
Mantia et al, 1991 ; Lania et ah 1992). 

Consistent with a common property of the retrotransposons, the 
5'HS5 LTR is flanked by 4 bases of direct repeats GTAT in the genomic 
5 DNA immediately 5' and 3* of the LTR sequence (Figure 2). This indicates 
that the 5' HS5 LTR was inserted into the human ancestral genome at the 
GTAT site sometime during evolution. In line with the general LTR 
structure of mammalian retroviruses (Temin, 1982), the 5'HS5 LTR contains 
the U3, R and U5 regions and is bracketed by the dinucleotides TG and CA 

10 respectively at the 5' and 3" ends (Figure 2). The U3 region contains the 
viral enhancer spanning tandemly repeated DNA sequences and the viral 
promoter (Lenz et al, 1984: Golemis et al, 1990; La Mantia et al, 1991; 
Anagnou et al, 1995). The R region starts with the viral transcription 
initiation site (La Mantia et al, 1 992) and is followed by the U5 region 

1 5 (Figure 1 ). In the U3 region, the 600 DNA bases preceding the U3 repeats 
are comprised of 70% G and C bases. This GC-rich region is found in many 
of the homologous ERV-9 LTRs in the data base but is not present in the 
LTR of the ERV-9 provirus (La Mantia et al, 1991). The U3 enhancer 
repeats and the promoter in the 5'HS5 LTR show 80-90% base identities 

20 with other ERV-9 LTRs found in the human genome (Yang et al, 1 983; La 
Mantia et al, 1991; Lania et al. 1992; Di Cristofano et al. 1995). 

It is of interest to note that in addition to the 5'HS5 LTR located 
approximately 25 kb 5' of the e-globin gene, another ERV-9 LTR is located 
at a position approximately 25 kb 3' to the P-globin gene (Figure 1). The 

25 repetitive DNA in the region 3 1 of the P-globin gene was first reported by 
Henthorn et al (1986) and subsequently studied by Anagnou et al (1995). 
Although neither of those groups recognized that the repetitive DNA was 
part of an endogenous LTR. sequence matches as shown above revealed that 
the repetitive DNA of this region bears sequence identities of 80-90% with 

30 . the U3, R and U5 regions of the 5'HS5 LTR. Thus, two copies of the ERV-9 
LTRs exist in flanking positions of the p-globin gene cluster. 

Sequence analysis of the U3 enhancer region in the 5' HS5 ERV-9 
LTR: The U3 enhancer region of the 5' HS5 LTR shows an interesting 
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sequence structure. It is comprised of fourteen tandem repeats of a 
consensus DNA sequence of 37-41 bases (Figure 2). Sequence matches 
show that the tandem repeats are comprised of four subtypes 1, 2, 3 and 4, 
which are arranged in the LTRinthe order 1-2-3-4-1-2-3-4-1-2-3-4-4-1 
5 (Figure 3). Among the four subtypes, the sequence identities are 60-80%, 
using subtype 2 as the reference. Among the U3 repeats of each subtype, the 
sequence identities are 80-98% (Figure 3). The consensus sequence of the 
fourteen U3 repeats (Figure 3) reveals recurrent sequence motifs that can 
potentially bind to the GATA (Ko and Engel, 1993); Merika and Orkin. 

1 0 1 993), CCAAT (Johnson and McKnight, 1989) and CACCC (Miller and 

Bieker, 1993; Crossley et al, 1996) transcription factors. Altogether, the U3 
enhancer region contains within 600 bases DNA eight GATA. nine CCAAT. 
three CACCC and four CCACC sites. The consensus sequence of the 
fourteen U3 repeats shows higher than 90% sequence identity with that of 

15 the seven U3 repeats in the'3'p LTR (Henthorn et al, 1986) and of the six U3 
repeats in LTR2, a random clone of the ERV-9 LTR (Lania et al-, 1 992) 
(Figure 3). 

Sequence analysis of the U3 promoter region: The promoter 
sequence in the LTR is located in the U3 region at the V end of the fourteen 

20 U3 repeats. It is located immediately upstream of the R region whose 5' 
border marks the transcriptional initiation site for retroviral RNA synthesis 
(Temin. 1982) (Figure 3). The promoter of the 5'HS5 LTR shows a 
sequence homology of 80% with the promoter of the 3'P LTR and of over 
90% with the promoter of LTR2 (Figure 4). The transcriptional initiation 

25 site of LTR2 has been determined by primer extension to be located 28 bases 
downstream of the AATAAAA box (La Mantia et al, 1992; Strazzullo et al, 
1994). Because of extensive sequence identities between the 5'HS5 LTR 
and the LTR2 promoters, especially the 100% sequence homology in the 70 
DNA bases flanking the AATAAAA box, the presumptive transcriptional 

30 initiation site of the 5'HS5 LTR was placed at the identical T base 28 bases 
downstream of the AATAAAA box (Figure 4). All three LTR promoters 
contain the GATA, CACCC and CCAAT motifs located at identical 
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locations, -36. -46 and -63 bases respectively, relative to the retroviral 
transcriptional initiation site (Figure 4). 

The 5'HS5 LTR promoter also bears structural similarities with the 
promoters of the further downstream s-, y- and p-globin genes (Baralle et al, 
5 1980; Shenetal, 1981; Poncz et al, 1983; Li et al, 1985) in that a 

combination of similar GATA, CACCC and CCAAT motifs is found also 
upstream of the AATAAAA boxes in the globin promoters (Nienhuis et al, 
1 984). In particular, the LTR promoter and the e-globin promoter share 
additional sequence identities in the region immediately 5' of the 

1 0 transcriptional initiation site (Figure 4). The above sequence and structural 
homologies suggest that, like the globin promoters, the 5 ? HS5 LTR promoter 
would be active in erythroid cells. 

The 5' HS5 ERV-9 LTR is conserved in the genomes of the 
gorilla and of people of different racial lineages: As the 5'HS5 LTR is 

1 5 apparently a retrotransposon and is located not near but far upstream of the 
P-like globin genes, it was possible that the 5'HS5 LTR might have resulted 
from a recent insertional event in the K562 genome during cell culture and 
did not serve a relevant cellular function. However, were this the case, the 
5'HS5 LTR would not be present in the genome of the gorilla which diverged 

20 from the human genome approximately 1 0 million years ago (Sibley and 
Ahlquist, 1987) nor in the genomes of people of different racial lineages 
which diverged approximately 100,000 years ago (Vogel and Motulsky, 
1986). To examine this issue. PCR was used to detect the presence or 
absence of the 5'HS5 LTR in the genomic DNAs isolated from the blood 

25 samples of the gorilla and people of different races. The PCR primers were 
synthesized according to the K562 DNA sequence, which amplified 1.2 kb of 
5'HS5 LTR including 1 30 bases of genomic DNA downstream of the LTR 
(see Methods and Figure 2). 

The PCR results indicate that the 5'HS5 ERV-9 LTR is conserved in 

30 the genomes of the gorilla and people across racial lines. Fifteen out of a 

total of 1 7 human DNAs isolated from Africans, Arab. Asian and Caucasians 
and from human cell lines K562 and HL60 produced amplicons of the 
anticipated length of 1.2 kb. However, two of the nine African DNAs 
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produced either a shorter amplicon of 1.1 kb or both a longer 1 .4 kb and a 
shorter 1.1 kb amplicons, while the gorilla DNA produced an even shorter 
amplicon of 0.9 kb (Figure 6). 

It was possible that the observed amplicons might be spurious PCR 
5 products amplified by the primer pair from other ERV-9 LTRs in the human 
or the gorilla genome, since the 5' primer was located within the U3 region 
immediately upstream of the enhancer repeats — a region present also in 
some of the other ERV-9 LTRs even though the 3' primer was located in the 
unique genomic DNA region (see Figure 2). Therefore, the authenticity of 
10 the amplicons was further confirmed by DNA sequencing. Four standard 

amplicons of 1 .2 kb from two Caucasian and two African DNAs, two shorter 
amplicons of 1 .1 kb from the African DNAs, and the 0.9 kb amplicon of the 
gorilla DNA were sequenced (Figure 5). The electropherograms of the DNA 
sequences showed sharp DNA sequence ladders with only a couple of 
1 5 ambiguities where two different bases occupied the same sequence positions, 
indicating that the two homologous chromosomes contained base 
polymorphism at these positions. All the sequenced amplicons showed base 
identities of 98-99% in both the LTR and the 3' flanking genomic DNA; the 
only exception was the fewer number of U3 repeats in some people and in 
20 the gorilla (Figures 5 and 6). If the sequenced amplicons contained 

amplification products generated also from other homologous ERV-9 LTRs. 
the electropherograms would have contained too many sequence ambiguities 
to generate clearly readable sequences. The above observations indicate that 
the amplicons were genuine products of the 5'HS5 LTR in the human and 
25 gorilla genomes. 

In both the shorter human amplicons containing eleven U3 repeats, 
the deletion of three complete U3 repeats was generated apparently by the 
same in phase deletion event so the subtype organizations of both amplicons 
were identical, 1-2-3-4-1-2-3-4-1-2-1, (Figures 5 and 6). In the gorilla 
30 amplicon with five U3 repeats, the subtype organization is 1-2-3-4-1 (Figures 
5 and 6). The apparent genomic insertion site of the LTR — the GTAT 
sequence is conserved in both the human and gorilla amplicons (Figure 5). 
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The remarkable sequence identities in the 5'HS5 LTR between 
human and gorilla and among people of different races indicate that this LTR 
was probably inserted into the 5' boundary area of the p-globin LCR at least 
10 million years ago before the divergence of the human and apes and it has 
5 been conserved in the genomes of the higher primates during the ensuing 
years of evolution. These observations indicate that this 5'HS5 LTR- 
retrotransposon is likely conserved for the preservation of a relevant cellular 
function of the host. 

The 5'HS5 LTR ERV-9 LTR possesses enhancer and promoter 

10 activities in erythroid cells: To demonstrate that the enhancer and promoter 
regions in the 5'HS5 LTR possess enhancer and promoter activities, seven 
recombinant CAT plasmids were made (Figure 7). LTR-CAT (Construct 1) 
contained the 1 kb LTR spanning the 14 U3 enhancer repeats. U3 promoter. 
R and U5 spliced 5' of the CAT gene in the absence of a promoter in the 

1 5 vector. To determine whether other regions of the 5' boundary area of the 
LCR also possessed enhancer and promoter activities, the control Ups-CAT 
plasmid (Construct 2) contained a 1 kb DNA (Ups) located further upstream 
of the LTR (Figure 1). The HS2-ep-CAT plasmid (Construct 4) that 
contained the strong HS2 enhancer of the LCR (Tuan et al, 1989) coupled to 

20 the e-globin promoter (sp) served as the standard with which to compare the 
enhancer and promoter activities of the 5'HS5 LTR. To test if the enhancer 
in 5'HS5 LTR can synergize with and activate the HS5 site located naturally 
downstream of and proximal to the LTR, LTR-ep-CAT. HS5-ep-CAT and 
LTR-HS5-ep-CAT (Constructs 5, 6 and Figure 7) contained respectively the 

25 LTR and HS5 site spliced either separately or together into ep-CAT 

(Construct 3). The plasmids were transiently transfected into erythroid K562 
and MEL cells and nonerythroid HL60 cells and stably integrated into K562 
cells. 

Transient transfection results indicate that in human erythroid K562 
30 cells, the LTR in LTR-CAT plasmid displayed enhancer and promoter 
activities that were approximately 50% of the combination of the HS2 
enhancer and the e-globin promoter in the HS2-sp-CAT plasmid. In 
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contrast, in murine erythroid MEL cells and human nonerythroid HL60 cells, 
both LTR-CAT and HS2-ep-CAT displayed much lower enhancer and 
promoter activities (Figure 8). The low enhancer activity of the HS2 
enhancer in MEL cells was due apparently to the inactivity of the cis-linked 
5 embryonic e-globin promoter in MEL cells expressing the adult globin 

program; when linked to the more permissive adult (5-globin promoter, the 
HS2 enhancer displayed much higher enhancer activity in MEL cells 
(Cavallesco and Tuan, 1997). Likewise, the U3 enhancer in the LTR may 
also be potentially active in MEL cells; its apparently low enhancer activity 

1 0 may be due to the low activity in MEL cells of the U3 promoter which shares 
certain sequence identities with the e-globin promoter (Figure 4). 

When stably integrated into the genome of K562 cells, the LTR 
displayed enhancer and promoter activities that were approximately 30% of 
those of the HS2-ep-CAT plasmid (Figure 9). However, in integrated LTR- 

1 5 HS5-ep-CAT plasmid, the LTR enhancer synergized with the HS5 site and 
activated the CAT gene to a level comparable to that displayed by the HS2 
enhancer in HS2-ep-CAT (Figure 9). These results indicate that the 5'HS5 
LTR possesses enhancer and promoter activities in erythroid cells and it 
synergized with and activated the HS5 site. 

20 The endogenous 5'HS5 LTR activates the transcription of 

downstream DNA preferentially in erythroid cells: It was next 
determined if the endogenous 5'HS5 LTR also exhibits enhancer and 
promoter activities and can activate the transcription of the downstream R 
region and the flanking genomic DNA in the p-globin LCR. The 

25 transcriptional statuses of the 5'HS5 LTR and downstream genomic DNA 
were determined by RT-PCR in erythroid K562 and non-erythroid T- 
lymphocytes and placental cells. 

Four PGR primer pairs were made (Figure 10). Primer pair 1 was 
synthesized to determine if the entire LTR between the U3 enhancer and the 

30 U5 regions as well as the genomic DNA immediately downstream of it was 
transcribed. Primer pair 2 was synthesized to detect retroviral mRNA 
transcripts of the R and U5 regions whose synthesis was activated by the U3 
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enhancer and promoter. In order to ensure that Primer pair 2 detected the 
RNA transcribed specifically from the 5'HS5 LTR and not RNAs transcribed 
from other ERV-9 LTRs, the forward Primer was located in the R region that 
contains a number of polymorphic bases among the ERV-9 LTRs Figure 2; 
5 Henthorn et al, 1 986 and Lania et al, 1 992) and the reverse primer is located 
in the genomic DNA immediately downstream of the LTR. Primer pairs 3 
and 4 were synthesized to confirm that the RNAs detected by Primer pairs 1 
and 2 were indeed transcribed from 5'HS5 ERV-9 LTR. These two primer 
pairs contained the same two respective forward primers as Primer pairs 1 
10 and 2 but shared a common reverse primer located in the genomic DNA 1 10 
bases further downstream of the reverse primer of Primer pairs 1 and 2. 
Hence, the authentic RT-PCR bands of the 5TIS5 LTR generated by these 
primer pairs would be 1 1 0 bases longer than those generated respectively by- 
Primer pairs 1 and 2 (Figure 10). 
1 5 Consistent with the design of the primer pairs (Figure 1 0), the sizes 

of the RT-PCR bands produced by Primer pairs 3 and 4 were indeed longer 
by 1 10 bases than those produced by Primer pairs 1 and 2. This indicates 
that the RT-PCR bands generated by Primer pairs 1-4 were genuine products 
amplified from the 5'HS5 LTR and not from other ERV-9 LTRs in the 
20 human genome. In addition, the authenticity of the PCR band produced by 
Primer pair 3 had been confirmed by direct DNA sequencing (Figure 5). 

For a semi-quantitative comparison of the intensities of RT-PCR 
bands generated by primer pairs 1-4 in different RNA samples, a p-actin 
primer pair spanning a region in the ubiquitous P-actin mRNA assumed to be 
25 expressed at a constant level in different cell types was included in the RT- 
PCRs. Consistent with this assumption, the intensities of the p-actin band 
generated by the same amount of different RNAs were similar. The relative 
intensities of the LTR bands with respect to the intensity of the p-autin band 
generated from aliquots of the same cDNA master stock as the LTR bands 
30 (see Methods) were then compared. 

The RT-PCR results indicate that the endogenous 5'HS5 LTR 
promoted the transcription of the R and U5 regions. In both erythroid and 
nonerythroid cells, Primer pairs 2 and 4 generated amplification bands of the 
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R and U5 regions. However, the LTR enhancer and promoter appear to be 
more active in erythroid than in nonerythroid cells, as the amplification 
bands generated from RNAs of K562 cells and CFU-E were relatively 
stronger than those of nonerythroid T-lymphocytes, N-Tera and HL60 cells. 
5 An apparent exception to the above observation was the nonerythroid 
placenta which also generated strong LTR bands. This may be due to 
contamination in placenta of abundant maternal and fetal blood erythroid 
cells in which the 5'HS5 LTR enhancer and promoter were active. On the 
other hand, the 5'HS5 LTR enhancer and promoter may also be active in the 

1 0 placenta since many HERVs and their solitary LTRs have been found to be 
capable of initiating viral RNA synthesis from the R region in placental cells 
(Wilkinson et al, 1994; Lower et al 1996). 

Further upstream of the R region in the LTR, no additional 
transcriptional initiation sites appear to exist in the majority of the cell types 

1 5 tested, since Primer pairs 1 and 3 did not generate detectable bands from 
RNAs of erythroid K562 and nonerythroid T-lymphocytes, N-Tera and 
HL60 cells. However, Primer pairs 1 and 3 generated faint amplification 
bands from erythroid CFU-E and nonerythroid placenta RNAs. This 
suggests that CFU-E and placenta may contain additional transcriptional 

20 initiation sites proximal to the 5'HS5 LTR. 

The above RT-PCR results indicate that the endogenous 5'HS5 LTR 
possesses apparent enhancer and promoter activities and is capable of 
promoting the transcription of the R and U5 regions in the LTR and of 
further downstream genomic DNA in the LCR. 

25 DISCUSSION 

This example shows that a solitary ERV-9 LTR with the 
characteristics of a retrotransposon is located proximal to the HS5 site in the 
apparent 5' boundary area of the p-globin LCR. This 5" HS5 ERV-9 LTR 
possesses unusual sequence features in the multiple tandem repeats of the U3 

30 enhancer region. The U3 repeats and the immediately downstream U3 

promoter contain within 700 DNA bases nine GATA, four CACCC and ten 
CCAAT sites. These DNA motifs can bind respectively to the cognate 
GATA (Orkin, 1992) and CACCC (Miller and Bieker. 1993; Crossley et al, 
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1 996) transcription factors expressed abundantly in erythroid cells and to the 
CCA AT factors C/EBP (Johnson and McKnight, 1989) and NF-Y (Bi et al, 

1 997) , expressed in many hematopoietic and nonhematopoietic cells. The 
high concentration of these motifs in the U3 region suggests that the 5'HS5 

5 ERV-9 LTR may be preferentially active in erythroid cells. 

The 5'HS5 LTR is conserved in the gorilla and in people of different 
racial lineages, indicating that this LTR was probably inserted into its 
location at the 5' boundary area of the LCR before species divergence 
between human and gorilla approximately 10 million years ago. The 
1 0 conservation of the 5'HS5 LTR during evolution of the higher primates 

suggests that this LTR-retrotransposon may serve a relevant cellular function 
of the host. 

Functional tests with the CAT reporter gene assays show that the 
5'HS5 LTR, in line with its component sequence motifs, possesses enhancer 
1 5 and promoter activities preferentially in erythroid cells. Moreover, the LTR 
enhancer activity can synergize with and activate the cis-linked HS5 site in 
the LCR. 
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CLAIMS 

1 . A nucleic acid molecule comprising all or a functional portion of the 
U3 enhancer (nucleotides 595 to 1 1 93 of SEQ ID NO: 1), wherein a functional 
portion is a portion of the U3 enhancer that retains enhancer function. 

2. The nucleic acid molecule of claim 1 further comprising all or a 
functional portion of the U3 insulator (nucleotides 5 to 594 of SEQ ID NO:l) 
operably linked to the enhancer, wherein a functional portion of the U3 insulator 
is a portion of the U3 insulator that retains insulator function. 

3. The nucleic acid molecule of claim 1 further comprising all or a 
functional portion of the U3 promoter (nucleotides 1 1 94 to 1 322 of SEQ ID 
NO: 1) operably linked to the enhancer, wherein a functional portion of the U3 
promoter is a portion of the U3 promoter that retains promoter function. 

4. The nucleic acid molecule of claim 1 further comprising the U3 R 
region (nucleotides 1322 to 1380 of SEQ ID NO: 1) operably linked to the 
enhancer. 

5. The nucleic acid molecule of claim 1 further comprising a ; gene 
operably linked to the enhancer. 

6. The nucleic acid molecule of claim 2 wherein the gene encodes a 

protein. 

7. A vector comprising the nucleic acid molecule of claim 6. 

8. A vector comprising the nucleic acid molecule of claim 5. 

9. The vector of claim 8 wherein the vector is a retroviral vector. 

1 0. A cell transformed with the vector of claim 8. 

1 1 . The cell of claim 10 wherein the cell is a mammalian cell. 

1 2. The cell of claim 1 1 wherein the cell is a cell in an animal. 

13. A method of expressing a protein, the method comprising culturing 
the transformed cell of claim 7, wherein the protein encoded by the protein 
encoded by the gene is expressed. 

14. A method of expressing a gene in an animal, the method comprising 
introducing the transformed cell of claim 10 into an animal, wherein the gene is 
expressed. 
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15. A method of expressing a gene in an animal, the method comprising 
introducing the vector of claim 8 into cells of an animal, wherein the gene is 
expressed. 

16. A nucleic acid molecule comprising all or a functional portion of the 
U3 insulator (nucleotides 5 to 594 of SEQ ID NO:l), wherein a functional 
portion is a portion of the U3 insulator that retains insulator function. 

1 7. The nucleic acid molecule of claim 1 6 further comprising all or a 
functional portion of the U3 promoter (nucleotides 1 194 to 1322 of SEQ ID 
NO:l) operably linked to the insulator, wherein a functional portion of the U3 
promoter is a portion of the U3 promoter that retains promoter function. 

18. The nucleic acid molecule of claim 16 further comprising the U3 R 
region (nucleotides 1322 to 1380 of SEQ ID NO:l) operably linked to the 
insulator. 

19. The nucleic acid molecule of claim 16 further comprising a gene 
operably linked to the insulator.' 

20. The nucleic acid molecule of claim 19 wherein the gene encodes a 
protein. 

21. A vector comprising the nucleic acid molecule of claim 20. 

22. A vector comprising the nucleic acid molecule of claim 19. 

23. The vector of claim 22 wherein the vector is a retroviral vector. 

24. A cell transformed with the vector of claim 22. 

25. The cell of claim 24 wherein the cell is a mammalian cell. 

26. The cell of claim 25 wherein the cell is a cell in an animal. 

27. A method of expressing a protein, the method comprising culturing 
the transformed cell of claim 21, wherein the protein encoded by the protein 
encoded by the gene is expressed. 

28. A method of expressing a gene in an animal, the method comprising 
introducing the transformed cell of claim 24 into an animal, wherein the gene is 
expressed. 

29. A method of expressing a gene in an animal, the method comprising 
introducing the vector of claim 22 into cells of an animal, wherein the gene is 
expressed. 
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30. A nucleic acid molecule comprising a modified U3 enhancer, wherein 
one or more of the repeat units of the enhancer are deleted, one or more of the 
repeat units are replaced with a repeat unit of the enhancer having a different 
sequence than the repeat unit that is replaced, one or more repeat units of the 
enhancer are added to the enhancer, or a combination of one or more of these 
modifications, 

wherein the modified enhancer retains enhancer function. 

31. A nucleic acid molecule comprising an enhancer, wherein the 
enhancer has three or more repeats, wherein each repeat has one of the following 
sequences: TRTCTAGCTCADGGTTTGTRAAYRCACCAATCAGCACTCTG 
(SEQ ID NO: 12), 

TATCTAGCTCAGGGATTGTAAATACACCAATCGGCAGTCTG (SEQ ID 
NO:8), 

TGTCTAGCTCAAGGTTTGTAAACACACCAATCAGCACCCTG (SEQ ID 
NO:9), 

TATCTAGCTCAGGGTTTGTGAATGCACCAATCAACACTCTG (SEQ ID 
NO: 10), 

TGTCTAGCTACTCTGTGGGGACGTGGAGAACCTTTA (SEQ ID NO:l 1 ). 

32. The nucleic acid molecule of claim 31, wherein each repeat has one 
of the following sequences: 

TATCTAGCTCAGGGATTGTAAATACACCAATCGGCAGTCTG (SEQ ID 
NO:8), 

TGTCTAGCTCAAGGTTTGTAAACACACCAATCAGCACCCTG (SEQ ID 
NO:9). 

TATCTAGCTCAGGGTTTGTGAATGCACCAATCAACACTCTG (SEQ ID 
NO: 10), 

TGTCTAGCTACTCTGTGGGGACGTGGAGAACCTTTA (SEQ ID NO:l 1). 

33. The nucleic acid molecule of claim 31, wherein each repeat has the 
following sequence: 

TRTCTAGCTCADGGTTTGTRAAYRCACCAATCAGCACTCTG (SEQ ID 
NO:12). 

34. The nucleic acid molecule of claim 3 1 wherein the enhancer has 
from three to fourteen repeat units. 
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35. A nucleic acid molecule comprising an enhancer, wherein the 
enhancer is a primate 5' HS5 ERV-9 LTR enhancer. 
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1 . To replace the LTRs or their component U3, R and U5 regions of retroviral vectors 
designed for gene therapy of hereditary or acquired hematological diseases. 



in. 



IV. 



U3 


R 


U5 




U3 


R 


U5 




U3 


R 


U5 




U3E U3p 


R 


U5 



Can be either the 5' or 
the 3* LTR or both the 
5' and 3" LTRs of an 
appropriate retroviral 
vector. 



U3 : the U3 enhancer and promoter of the 5'HS5 ERV-9 LTR 
R: the R region of the 5'HS5 ERV-9 LTR 
U5 : the U5 region of the 5'HS5 ERV-9 LTR 
U3E: the U3 enhancer of the 5'HS5 ERV-9 LTR 

U3p, R and U5: the U3 promoter, R and U5 regions of appropriate non-5'HS5 
ERV-9 LTRs. 
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2 . To activate in hematopoietic cells the transcription of a cis-Iinked transgene spliced 
in either viral or non-viral vectors. 



IV 



Vlll 



IX 
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Gene 




U3E 


U3P 


R 


U5 


Gene 




U3E 


P 


R 


U5 


Gene 
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gene 
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gene 
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gene 





U3: the U3 enhancer and promoter of the 5'HS5 ERV-9 LTR 
R: the R region of the 5'HS5 ERV-9 LTR 
U5 : the U5 region of the 5'HS5 ERV-9 LTR 
U3E: the U3 enhancer of the 5'HS5 ERV-9 LTR 
U3P: the U3 promoter of the 5'HS5 ERV-9 LTR 

R and U5: the R and U5 regions of appropriate non-5'HS5 
ERV-9 LTRs. 

P- appropriate promoter other than the U3 promoter of the 5'HS5 ERV-9 LTR. 



Figure \Z 
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Design of the vectors: 



in 
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Can be either the 5' or 
the 3' LTR or both the 
S'andS'LTRs of an 
appropriate ( retroviral 
vector. 
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Hatched box: the U3 insulator sequence. 

E and P: appropriate enhancer and promoter sequences. 

gene: appropriate transgene 

ail other designations: the same as Fr^yv-e \Z 
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SEQUENCE LISTING 

<110> Medical College of Georgia Research Institute, Inc 

<120> Long Terminal Repeat, Enhancer, and Insulator Sequences 
for Use in Recombinant Vectors 

<130> MCG 112 PCT 

<140> Not Yet Assigned 
<141> 1999-10-21 

<150> 60/105,256 
<151> 1998-10-22 

<160> 22 

<17 0> Patentln Ver. 2.1 

<210> 1 

<211> 1831 

<212> DNA 

<213> Homo sapiens 

<400> 1 

gtattgagag gtgacagcgt gctggcagtc ctcacagccc tcgctcgctc ttggcgcctc 60 
ctctgcctgg gctcccacat tggtggcact tgaggagccc ttcagccggc cgctgcactg 120 
tgggagccct tttctgggct ggccaaggcc agagccggct ccctcagctt gccaggaggt 180 
gtggagggac agacgcgggc aggaaccggg ctgtgcgccg tgcttgaggg agttccgggt 24 0 
gggcatgggc tccgaggacc ccgcactcgg agccgccagc cggccccacc ggccgcgggc 300 
agtgaggggc ttagcacctg ggccagcagc tgctgtgctc aattcctcgc cgggccttag 360 
ctgccttcct gcggggcagg gctcgggacc tgcagcgcgc catgcctgag cctccccacc 420 
ttcatgggct cctgtgcggc ccgagcctcg ccgacgagcg ccgccccctg ctccagggca 480 
cccagtccca tcgaccaccc aagggctgaa gagtgcgggc gccagcaagg ggactggcag 540 
gcagctcccc ctgcagccca ggtgcgggat ccactgggtg aagccggcta ggtcctgagt 600 
ttgctgggga tgcgaagaac ccttatgtct agataaggga ttgtaaatac accaattggc 660 
actctgtatc tagctcaagg tttgtaaaca caccaatcag caccctgtgt ctagctcagg 720 
gtttgtgaat gcaccaatca acactctatc tagctactct ggtggggcct tggagaacct 780 
ttatgtctag ctcagggatt gtaaatacac caatcggcag tctgtatcta gctcaaggtt 840 
tgtaaacaca ccaatcagca ccctgtgtct agctcagggt ttgtgaatgc accaatcaac 900 
actctgtatc tagctactct ggtggggacg tggagaacct ttatgtctag ctcagggatt 960 
gtaaatacac cactcggcag tctgtatcta gctcaaggtt tgtaaacaca ccaatcagca 1020 
ccctgtgtct agctcagggt ttgtgaatgc accaatcaac actctgtatc tagctactct 1080 
ggtgggactt ggagaacctt tgtgtggaca ctctgtatct agctaatctg gtggggacgt 1140 
ggagaacctt tgtgtctagc tcatggattg taaatgcacc aatcagtgcc ctgtcaaaac 1200 
agaccactgg gctctctacc aatcagcagg atgtgggtgg ggccagataa gagaataaaa 12 60 
gcaggctgcc cgagccagca gtggcaaccc gctcgggtcc ccttccacac tgtggaagct 1320 
ttgttctttc gctctttgca ataaatcttg ctgctgctca ctgtttgggt ctacactgcc 1380 
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tttatgagct gtaacgctca 
cacgaaccca ccggaggaac 
ctgtgaaggt ctgcagcttc 
aactcgaaca catccaaaca 
taacactcac cacgagggtc 
caattccgga cacagtatgt 
aacagccctt gcaattaact 
ataatgtgtt actccctaag 



ccgcgaaggt ctgcagcttc 
gaacaactcc agaggcgccg 
actcctgagc cagcgagacc 
tcagaacgaa caactccaca 
cccggcttca ttcttgaagt 
cagaaacaat atgagtcact 
tggccatgtg actggttgtg 
gcagagtgcc c 



actcttgaag ccagcgagac 144 0 
cttaagagct ggaacgttca 1500 
acgaacccat cagaaggaag 1560 
cacgcagcct ttaagaactg 1620 
cagtgaaacc aagaacccac 1680 
aaatcaatat acttctcaac 1740 
actaaaataa tgtggagata 1800 

1831 



<210> 2 

<211> 103 

<212> DNA 

<213> Homo sapiens 

<400> 2 

tcaaaacgga ccaataagct ctctgtaaaa tgggccaatc agcaggatgt gggtggggtc 60 
agataaggaa ataaaagcag gctgccagag ccagctgtga caa 103 



<210> 3 

<211> 87 

<212> DNA 

<213> Homo sapiens 

<400> 3 

tcaaaccact cggctctacc aatcagcagg atgtgggtgg ggccagataa gagaataaaa 60 
gcaggctgcc cgagccagca gtggcaa 87 



<210> 4 
<211> 105 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Epsilon 1.4 
phage 

<400> 4 

gacacaggtc agccttgacc aatgactttt aagtaccatg gagaacaggg ggccagaatt 60 
cggcagtaaa gaataaaagg ccagacagag aggcagcagc acata 10 



<210> 5 
<211> 1091 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: consensus 
sequence 

<400> S 

tatgtctaga taagggattg taaatacacc aattggcact ctgtatctag ctcaaggttt 60 
gtaaacacac caatcagcac cctgtgtcta gctcagggtt tgtgaatgca ccaatcaaca 120 
ctctatctag ctactctggt ggggccttgg agaaccttta tgtctagctc agggattgta 180 
aatacaccaa tcggcagtct gtatctagct caaggtttgt aaacacacca atcagcaccc 240 
tgtgtctagc tcagggtttg tgaatgcacc aatcaacact ctgtatctag ctactctggt 300 
ggggacgtgg agaaccttta tgtctagctc agggattgta aatacaccac tcggcagtct 360 
gtatctagct caaggtttgt aaacacacca atcagcaccc tgtgtctagc tcagtatcta 420 
gctaatctgg tggggangtg gagaaccttt gtgtctagct catggattgt aaatgcacca 480 
atcagtgccc tgtcaaaaca gaccactggg ctcttaccaa tcagcaggat gtgggtgggg 540 
ccagataaga gaataaaagc aggctgcccg agccagcagt ggcaacccgc tcgggtcccc 600 
ttccacactg tggaagcttt gttctttcgc tctttgcaat aaatcttgct gctgctcact 660 
gtttgggtct acactgcctt tatgagctgt aacgctcacc gcgaaggtct gcagcttcac 720 
tcttgaagcc agcgagacca cgaacccacc gggaggaacg aacaactcca gaggcgccgc 780 
cttaagagct ggaacgttca ctgtgaaggt ctgcagcttc actcctgagc cagcgagacc 840 
acgaacccat cagaaggaag aaactccgaa cacatccaaa catcagaacg aacaaactcc 900 
acacacgcag cctttaagaa ctgtaacact caccacgagg gtccccggct tcattcttga 960 
agtcagtgaa accaagaacc caccaattcc ggacacagta tgtcagaaac aatatgagtc 102 0 
actaaatcaa tatacttctc aacaatttcc aacagccctt gcaattaact tggccatgtg 1080 
actggttgtg a 10* 1 



<210> 6 

<211> 1043 

<212> DNA 

<213> Homo sapiens 

<400> 6 

tatgtctacc ataagggatt gtaaatacac 
tgtaaacaca ccaatcagca ccctgtgtct 
actctatcta gctactctgg tggggccttg 
aaatacacca atcggcagtc tgtatctagc 
ctgtgtctag ctcagggttt gtgaatgcac 
tggggacgtg gagaaccttt atgtctagct 
tgtatctagc tcaaggtttg taaacacacc 
gtaaatgcac caatcagtgc cctgtcaaaa 
tgtgggtggg gccagataag agaataaaag 
ctcgggtccc cttccacact gtggaagctt 
tgctgctcac tgtttgggtc tacactgcct 
tgcagcttca ctcttgaagc cagcgagacc 
agaggcgccg ccttaagagc tggaacgttc 
gccagcgaga ccacgaaccc atcagaagga 
cgaacaaact ccacacacgc agcctttaag 



caattggcac tctgtatcta gcccaaggtt 60 
agctcagggt ttgtgaatgc accaatcaac 120 
gagaaccttt atgtctagct cagggattgt 180 
tcaaggtttg taaacacacc aatcagcacc 240 
caatcaacac tctgtatcta gctactctgg 300 
cagggattgt aaatacacca ctcggcagtc 360 
aatcagcacc ctgtgtctag ctcatggatt 420 
cagaccactg ggctctacca atcagcagga 4 80 
caggctgccc gagccagcag tggcaacccg 540 
tgttctttcg ctctttgcaa taaatcttgc 600 
ttatgagctg taacgctcac cgcgaaggtc 660 
acgaacccac cgggaggaac gaacaactcc 720 
actggtaaag gtctgcagct tcactcctga 780 
agaaactccg aacacatcca aacatcagaa 840 
aactgtaaca ctcaccacga gggtccccgg 900 
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cttcattctt gaagtcagtg aaaccaagaa 
acaatatgag tcactaaatc aatatacttc 
cttggccatg tgactggttg tga 



cccaccaatt ccggacacag tatgtcagaa 960 
tcaacaattt ccaacagccc ttgcaattaa 1020 

1043 



<210> 7 
<211> 801 
<212> DNA 
<213> Gorilla 

<400> 7 

tatgtctaga taagggattg taaatacacc 

gtaaacacac caatcagcac cctgtgtcta 

ctctgtatct agctaatctg gtggggaagt 

taaacgcacc aatcagcacc ctgtcaaaac 

gtgggtgggg ccagataaga gaataaaagc 

tcaggtcccc ttccacactg cggaagcttt 

gctgctcact gtttgggtct acactgcctt 

cagcttcact cttgaagcca gcgagaccac 

acgcaccgcc ttaagagctg gaacgttcac 

agcgagacca cgaacccatc agaaggaaga 

acaaactcca cacacgcagc ctttaagaac 

tcattcttga aagtcagtga aaccaagaac 

aatatgagtc actaaatcaa tatacttctc 

tggccatgtg actggttgtg a 



aattggcact ctgtatctag ctcaaggttt 60 
gctcagggtt tgtgaatgca ccaatcaaca 120 
ggagaacctt tgtgtctagc tcagggattg 180 
agaccactgg gctctaccaa tcagcaggat 240 
aggctgccca agccagcagt ggcaacgtgc 3 00 
gttctttcgc tctttgcaat aaatcttgct 360 
tacgagctat aacgctcacc cgaaggtctg 420 
gaacccactg ggaggaacga acaactccag 4 80 
tgtgaaggtc tgcagcttca ctcctgagcc 540 
aactccgaac acatccaaac atcagaacga 6 00 
tgtaacactc accacgaggg tcccgcggct 660 
ctaccaattc ggacacagta tgtcagaaac 720 
aacaatttcc aacagccctt gcaattaact 7 80 

801 



<210> 8 
<211> 41 
<212> DNA 

<213> Homo sapiens 
<400> 8 

tatctagctc agggattgta aatacaccaa tcggcagtct g 



<210> 9 
<211> 41 
<212> DNA 

<213> Homo sapiens 
<400> 9 

tgtctagctc aaggtttgta aacacaccaa tcagcaccct g 



<210> 10 
<211> 41 
<212> DNA 
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<213> Homo sapiens 
<400> 10 

tatctagctc agggtttgtg aatgcaccaa tcaacactct g 



<210> 11 
<211> 37 
<212> DNA 

<213> Homo sapiens 



<400> 11 

tgtctagcta ctctggtggg gacgtggaga accttta 



<210> 12 
<211> 41 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: consensus 
sequence 

<400> 12 

trtctagctc adggtttgtr aayrcaccaa tcagcactct g 



<210> 13 
<211> 41 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: consensus 
sequence 

<400> 13 

tgtctagctm aaggtttgta aatgcaccaa tcagcactct g 



<210> 14 

<211> 41 

<212> DNA 

<213> Artificial 



Sequence 



<220> 

<223> Description of Artificial Sequence: consensus 
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sequence 
<400> 14 

trtctagctm arggwttgta aacrcaccaa tcagcactct g 

<210> 15 
<211> 31 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
oligonucleotide 

<400> 15 

actgtcgaca agcttctgac aaattattct t 

<210> 16 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
oligonucleotide 

<400> 16 

gatggatcca ctgaaagggc tcatgcaac 

<210> 17 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
oligonucleotide 

<400> 17 

ctgagtttgc tggggatgcg aa 

<210> 18 
<211> 26 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence 
oligonucleotide 

<400> 18 

gatttagtga ctcatattgt ttctga 



<210> 19 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence 
oligonucleotide 

<400> 19 

tgctgctgct cactgtttgg gtcta 

<210> 20 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence 
oligonucleotide 

<400> 20 

gggcactctg ccttagggag taaca 

<210> 21 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence 
oligonucleotide 

<400> 21 

actgtcgact tatgtattca agttcg 
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<210> 22 

<211> 27 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
oligonucleotide 

<400> 22 

gatggatcca atagattttt gtcatct 27 
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